Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al

at Leiden University. It was developed as a modification of the

Louvain method to address the issues with disconnected communities.

Graph components

Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph.

Vertices and edges

A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let

{\displaystyle V}

be the set of vertices, and

{\displaystyle E}

be the set of edges:

{

…

}

{

…

}

{\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}}

where

{\displaystyle e_{ij}}

is the directed edge from vertex

{\displaystyle v_{i}}

to vertex

{\displaystyle v_{j}}

. We can also write this as an ordered pair:

(

)

{\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}}

Community

A community is a unique set of nodes:

⊆

⋂

∅

∀

≠

{\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}}

and the union of all communities must be the total set of vertices:

⋃

{\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}}

Partition

A partition is the set of all communities:

{

…

}

{\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}}

Quality

Similar to modularity, the quality function is used to assess how well the communities have been allocated. The Leiden algorithm uses the Constant Potts Model (CPM):

(

)

∑

∈

(

)

−

(

)

{\displaystyle {\begin{aligned}{\mathcal {H}}(G,{\mathcal {P}})&=\sum _{C\in {\mathcal {P}}}|E(C,C)|-\gamma {\binom {||C||}{2}}\end{aligned}}}

Algorithm

The Leiden algorithm is similar to that of the Louvain method, with some important modifications.

Step 1: First, each node in the network is assigned to its own community.

Step 2: Next, we decide which communities to move the nodes into and update the partition

{\displaystyle {\mathcal {P}}}

queue = V(G) # create a queue from the nodes

while queue != empty:

node = queue.next() # get the next node

delta_H = 0

for C in communities: # compute the change in quality for each community

if delta_H(node, C) > delta_H:

delta_H = delta_H(node, C)

community = C

if delta_H > 0:

move node to community

outside_nodes = { node_i | (node, node_i) are edges and node_i is not in community } # find the nodes which are connected to the node but not in the community

queue.add(outside_nodes not already in queue)

Step 3: Assign each node in the graph to its own community in a new partition called

refined

{\displaystyle {\mathcal {P}}_{\text{refined}}}

Step 4: The goal of this step is to separate poorly-connected communities:

for C in communities of P:

# find the nodes in the community which have lots of edges within the community

well_connected_nodes = { node | node is in C, |E(node, C\node)| >= gamma ||node||(||C|| - ||node||) }

for node in well_connected_nodes:

if node is singleton under P_refined:

well_connected_communities = { C_i | C_i is in P_refined, C_i is a subset of C, |E(C_i, C\C_i)| >= gamma*||C_i||(||C|| - ||C_i||)

for C_i in well_connected_communities:

compute probability P(C_i) # 0 if assignment of node to C_i decreases quality of P_refined, greater weights for greater quality increases

assign node to C_i by sampling P(C_i) distribution

Step 5: Use the refined partition

refined

{\displaystyle {\mathcal {P}}_{\text{refined}}}

to aggregate the graph. Each community in

refined

{\displaystyle {\mathcal {P}}_{\text{refined}}}

becomes a node in the new graph

agg

{\displaystyle G_{\text{agg}}}

Example: Suppose that we have:

{

}

{

}

{

}

{

}

refined

{

}

{\displaystyle {\begin{aligned}V&=\{v_{1},v_{2},v_{3},v_{4},v_{5},v_{6},v_{7}\}\\C_{1}&=\{v_{1},v_{2},v_{3},v_{4}\}\\C_{2}&=\{v_{5},v_{6},v_{7}\}\\{\mathcal {P}}&=\{C_{1},C_{2}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1a},C_{1b},C_{2}\}\end{aligned}}}

Then our new set of nodes will be:

{

↦

}

{\displaystyle {\begin{aligned}V_{agg}&=\{C_{1a}\mapsto w_{1a},C_{1b}\mapsto w_{1b},C_{2}\mapsto w_{2}\}\end{aligned}}}

Step 6: Update the partition

{\displaystyle {\mathcal {P}}}

using the aggregated graph. We keep the communities from partition

{\displaystyle {\mathcal {P}}}

, but the communities can be separated into multiple nodes from the refined partition

refined

{\displaystyle {\mathcal {P}}_{\text{refined}}}

{

⊆

∈

(

agg

)

}

∈

}

{\displaystyle {\begin{aligned}{\mathcal {P}}&=\{\{v~|~v\subseteq C,v\in V(G_{\text{agg}})\}~|~C\in {\mathcal {P}}\}\end{aligned}}}

Example: Suppose that

{\displaystyle C}

is a poorly-connected community from the partition

{\displaystyle {\mathcal {P}}}

{

}

{

}

{\displaystyle {\begin{aligned}C&=\{v_{1},v_{2},v_{3},v_{4},v_{5}\}\\{\mathcal {P}}&=\{C\}\end{aligned}}}

Then suppose during the refinement step, it was separated into two communities,

{\displaystyle C_{1}}

and

{\displaystyle C_{2}}

{

}

{

}

refined

{

}

{\displaystyle {\begin{aligned}C_{1}&=\{v_{1},v_{2},v_{3}\}\\C_{2}&=\{v_{4},v_{5}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1},C_{2}\}\end{aligned}}}

When we aggregate the graph, the new nodes will be:

(

agg

)

{

}

{\displaystyle {\begin{aligned}V(G_{\text{agg}})&=\{C_{1},C_{2}\}\end{aligned}}}

but we will keep the old partition:

{

}

{\displaystyle {\begin{aligned}{\mathcal {P}}&=\{\{C_{1},C_{2}\}\}\end{aligned}}}

Step 7: Repeat Steps 2 - 6 until each community consists of only one node.

== References ==